Code
# Load necessary packages
pacman::p_load(tidyverse, sf, spdep, GWmodel, tmap, caret, ranger, httr, jsonlite, ggplot2, ggpubr, ggstatsplot, units, olsrr, gtsummary, SpatialML,Metrics)Luo Yuming
October 24, 2024
November 15, 2024
This exercise focuses on predicting HDB resale prices in Singapore using geographically weighted machine learning methods. By incorporating spatial components, we aim to understand regional patterns and improve prediction accuracy.
The goals of this exercise are to:
In this exercise, we will use the following packages:
| Package | Description |
|---|---|
| sf | Provides functions for reading, processing, and visualizing spatial data in the “Simple Features” format, enabling spatial data handling in R. |
| spdep | Provides tools for spatial dependency modeling, including spatial weights and measures for spatial autocorrelation, such as Moran’s I, useful for detecting spatial patterns. |
| tidyverse | A suite of R packages designed for data manipulation (dplyr, tidyr), visualization (ggplot2), and other common data science tasks, improving data handling and analysis. |
| tmap | A flexible package for creating static and interactive maps, allowing cartographic-quality visualizations of spatial data. |
| GWmodel | Contains functions for Geographically Weighted Regression (GWR) and other spatially weighted models, allowing local modeling of spatial data where relationships can vary across geographic space. |
| caret | A comprehensive package for machine learning, providing tools for model training, tuning, and evaluation, supporting methods like cross-validation and hyperparameter tuning. |
| ranger | An efficient implementation of the random forest algorithm optimized for large datasets, used for predictive modeling and capable of handling both classification and regression tasks. |
| httr | Facilitates HTTP requests in R, useful for connecting to APIs like OneMap to retrieve geographic coordinates based on addresses. |
| jsonlite | A package for working with JSON data in R, enabling easy conversion of JSON data from web APIs into R data frames. |
| Dataset Name | Description | Format | Source |
|---|---|---|---|
| Master Plan 2019 Subzone | Geospatial data representing the boundaries of different regions in Singapore. | Shapefile | Singapore Land Authority |
| HDB Resale Transactions | Aspatial data containing information on HDB resale prices, house type and location. | CSV | Singapore Open Data Portal |
| Various Facilities Data | Includes data on the location of elderly care centres, supermarkets, hawker centres, parks, etc. for analysing the impact of proximity on price. | GeoJSON | Various open sources in Singapore |
| MRT/Bus Stations | Data on the location of MRT and Bus stations in Singapore is used to calculate the impact of commuting convenience on house prices. | Shapefile | Singapore Land Transport Authority |
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`/Users/lucasluo/Desktop/SMU/Courses/Term3 Aug-Dec/ISSS626-Geospatial Analytics and Applications/lucasluo6/ISSS626/Take-home_Ex/Take-home_Ex03/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
Reading layer `BusStop' from data source
`/Users/lucasluo/Desktop/SMU/Courses/Term3 Aug-Dec/ISSS626-Geospatial Analytics and Applications/lucasluo6/ISSS626/Take-home_Ex/Take-home_Ex03/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5166 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48285.52 ymax: 52983.82
Projected CRS: SVY21
Reading layer `RapidTransitSystemStation' from data source
`/Users/lucasluo/Desktop/SMU/Courses/Term3 Aug-Dec/ISSS626-Geospatial Analytics and Applications/lucasluo6/ISSS626/Take-home_Ex/Take-home_Ex03/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 231 features and 7 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 6068.209 ymin: 27478.44 xmax: 45377.5 ymax: 47913.58
Projected CRS: SVY21
This step filters the train station dataset to retain only actual station locations, removing entries such as depots or non-station facilities. We use grepl("STATION|MRT", STN_NAM_DE) to select records where STN_NAM_DE contains “STATION”or”MRT”, ensuring that only transit stations are included for accurate spatial analysis.
Reading layer `HawkerCentresGEOJSON' from data source
`/Users/lucasluo/Desktop/SMU/Courses/Term3 Aug-Dec/ISSS626-Geospatial Analytics and Applications/lucasluo6/ISSS626/Take-home_Ex/Take-home_Ex03/data/geospatial/HawkerCentresGEOJSON.geojson'
using driver `GeoJSON'
Simple feature collection with 125 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6974 ymin: 1.272716 xmax: 103.9882 ymax: 1.449017
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Reading layer `CHASClinics' from data source
`/Users/lucasluo/Desktop/SMU/Courses/Term3 Aug-Dec/ISSS626-Geospatial Analytics and Applications/lucasluo6/ISSS626/Take-home_Ex/Take-home_Ex03/data/geospatial/CHASClinics.geojson'
using driver `GeoJSON'
Simple feature collection with 1193 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.5818 ymin: 1.016264 xmax: 103.9903 ymax: 1.456037
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Reading layer `SupermarketsGEOJSON' from data source
`/Users/lucasluo/Desktop/SMU/Courses/Term3 Aug-Dec/ISSS626-Geospatial Analytics and Applications/lucasluo6/ISSS626/Take-home_Ex/Take-home_Ex03/data/geospatial/SupermarketsGEOJSON.geojson'
using driver `GeoJSON'
Simple feature collection with 526 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6258 ymin: 1.24715 xmax: 104.0036 ymax: 1.461526
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Reading layer `LTASchoolZone' from data source
`/Users/lucasluo/Desktop/SMU/Courses/Term3 Aug-Dec/ISSS626-Geospatial Analytics and Applications/lucasluo6/ISSS626/Take-home_Ex/Take-home_Ex03/data/geospatial/LTASchoolZone.geojson'
using driver `GeoJSON'
Simple feature collection with 211 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY, XYZ
Bounding box: xmin: 103.687 ymin: 1.272736 xmax: 103.9668 ymax: 1.457587
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Note loads the resale data and filters it to only include transactions from January 2023 to September 2024. glimpse() provides an overview of the data structure and variables.
get_coords <- function(add_list){
# Create a data frame to store all retrieved coordinates
postal_coords <- data.frame()
for (i in add_list){
#print(i)
r <- GET('https://www.onemap.gov.sg/api/common/elastic/search?',
query=list(searchVal=i,
returnGeom='Y',
getAddrDetails='Y'))
data <- fromJSON(rawToChar(r$content))
found <- data$found
res <- data$results
# Create a new data frame for each address
new_row <- data.frame()
# If single result, append
if (found == 1){
postal <- res$POSTAL
lat <- res$LATITUDE
lng <- res$LONGITUDE
new_row <- data.frame(address= i,
postal = postal,
latitude = lat,
longitude = lng)
}
# If multiple results, drop NIL and append top 1
else if (found > 1){
# Remove those with NIL as postal
res_sub <- res[res$POSTAL != "NIL", ]
# Set as NA first if no Postal
if (nrow(res_sub) == 0) {
new_row <- data.frame(address= i,
postal = NA,
latitude = NA,
longitude = NA)
}
else{
top1 <- head(res_sub, n = 1)
postal <- top1$POSTAL
lat <- top1$LATITUDE
lng <- top1$LONGITUDE
new_row <- data.frame(address= i,
postal = postal,
latitude = lat,
longitude = lng)
}
}
else {
new_row <- data.frame(address= i,
postal = NA,
latitude = NA,
longitude = NA)
}
# Add the row
postal_coords <- rbind(postal_coords, new_row)
}
return(postal_coords)
}We’ll use the st_zm() function to remove the z-dimension from the data.
# Check for invalid geometries
datasets <- list(mpsz = mpsz, bus_stops = bus_stops, train_stations = train_stations,
hawker_center = hawker_center, clinics = clinics,
supermarkets = supermarkets, schoolzone = schoolzone)
for (name in names(datasets)) {
invalid_count <- sum(!st_is_valid(datasets[[name]]))
cat("Number of invalid geometries in", name, ":", invalid_count, "\n")
}Number of invalid geometries in mpsz : 9
Number of invalid geometries in bus_stops : 0
Number of invalid geometries in train_stations : 2
Number of invalid geometries in hawker_center : 0
Number of invalid geometries in clinics : 0
Number of invalid geometries in supermarkets : 0
Number of invalid geometries in schoolzone : 0
# Repair invalid geometries
for (name in names(datasets)) {
if (sum(!st_is_valid(datasets[[name]])) > 0) {
datasets[[name]] <- st_make_valid(datasets[[name]])
cat("Repaired invalid geometries in", name, "\n")
}
# Check again to confirm all geometries are now valid
invalid_count <- sum(!st_is_valid(datasets[[name]]))
cat("After repair, number of invalid geometries in", name, ":", invalid_count, "\n")
}Repaired invalid geometries in mpsz
After repair, number of invalid geometries in mpsz : 0
After repair, number of invalid geometries in bus_stops : 0
Repaired invalid geometries in train_stations
After repair, number of invalid geometries in train_stations : 0
After repair, number of invalid geometries in hawker_center : 0
After repair, number of invalid geometries in clinics : 0
After repair, number of invalid geometries in supermarkets : 0
After repair, number of invalid geometries in schoolzone : 0
Note This transformation combines block and street name into a single address field and extracts remaining_lease_yr and remaining_lease_mth for further analysis.
Note filters the dataset to only include September 2024 transactions and extracts unique addresses to be geocoded.
Note This function get_coords uses the OneMap API to retrieve coordinates (latitude and longitude) for each unique address in add_list. It handles cases where multiple or no results are returned for an address.
Note The geocoded coordinates are saved to an RDS file to avoid re-running the API calls, making future analyses more efficient.
Note This code joins the geocoded coordinates with the filtered resale data, creating a geospatial dataset with latitude and longitude fields for further spatial analysis.
# Enable automatic fixing of invalid polygons
tmap_options(check.and.fix = TRUE)
# Switch to interactive view mode
tmap_mode("view")
# Interactive map with mpsz as base
tm_shape(mpsz) +
tm_polygons(col = "white", border.col = "grey") +
# Add layers interactively
tm_shape(resale_geo) +
tm_symbols(col = "yellow", size = 0.1, shape = 19) +
tm_shape(clinics) +
tm_symbols(col = "blue", size = 0.1, shape = 19) +
tm_shape(hawker_center) +
tm_symbols(col = "red", size = 0.15, shape = 21) +
tm_shape(train_stations) +
tm_symbols(col = "green", size = 0.1, shape = 19) +
tm_shape(bus_stops) +
tm_symbols(col = "grey", size = 0.05, shape = 20) +
tm_shape(supermarkets) +
tm_symbols(col = "brown", size = 0.1, shape = 19) +
tm_layout(title = "Interactive Map of Amenities in Singapore")